NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

UDP: Utility-Driven Fetch Directed Instruction Prefetching

https://doi.org/10.1109/ISCA59077.2024.00089

Oh, Surim; Xu, Mingsheng; Khan, Tanvir Ahmed; Kasikci, Baris; Litz, Heiner (June 2024, IEEE)

Full Text Available
RPG ² : Robust Profile-Guided Runtime Prefetch Generation

https://doi.org/10.1145/3620665.3640396

Zhang, Yuxuan; Sobotka, Nathan; Park, Soyoon; Jamilan, Saba; Khan, Tanvir Ahmed; Kasikci, Baris; Pokam, Gilles A; Litz, Heiner; Devietti, Joseph (April 2024, ACM)

Data cache prefetching is a well-established optimization to overcome the limits of the cache hierarchy and keep the processor pipeline fed with data. In principle, accurate, well-timed prefetches can sidestep the majority of cache misses and dramatically improve performance. In practice, however, it is challenging to identify which data to prefetch and when to do so. In particular, data can be easily requested too early, causing eviction of useful data from the cache, or requested too late, failing to avoid cache misses. Competition for limited off-chip memory bandwidth must also be balanced between prefetches and a program's regular "demand" accesses. Due to these challenges, prefetching can both help and hurt performance, and the outcome can depend on program structure, decisions about what to prefetch and when to do it, and, as we demonstrate in a series of experiments, program input, processor microarchitecture, and their interaction as well. To try to meet these challenges, we have designed the RPG2 system for online prefetch injection and tuning. RPG2 is a pure-software system that operates on running C/C++ programs, profiling them, injecting prefetch instructions, and then tuning those prefetches to maximize performance. Across dozens of inputs, we find that RPG2 can provide speedups of up to 2.15×, comparable to the best profile-guided prefetching compilers, but can also respond when prefetching ends up being harmful and roll back to the original code - something that static compilers cannot. RPG2 improves prefetching robustness by preserving its performance benefits, while avoiding slowdowns.
more » « less
Full Text Available
Online Code Layout Optimizations via OCOLOS

https://doi.org/10.1109/MM.2023.3274758

Zhang, Yuxuan; Khan, Tanvir Ahmed; Pokam, Gilles; Kasikci, Baris; Litz, Heiner; Devietti, Joseph (July 2023, IEEE Micro)

Full Text Available
OCOLOS: Online COde Layout OptimizationS

https://doi.org/10.1109/MICRO56248.2022.00045

Zhang, Yuxuan; Khan, Tanvir Ahmed; Pokam, Gilles; Kasikci, Baris; Litz, Heiner; Devietti, Joseph (October 2022, International Symposium on Microarchitecture (MICRO))

Full Text Available
Whisper: Profile-Guided Branch Misprediction Elimination for Data Center Applications

https://doi.org/10.1109/MICRO56248.2022.00017

Khan, Tanvir Ahmed; Ugur, Muhammed; Nathella, Krishnendra; Sunwoo, Dam; Litz, Heiner; Jimenez, Daniel A.; Kasikci, Baris (October 2022, Whisper: Profile-Guided Branch Misprediction Elimination for Data Center Applications)

Modern data center applications experience frequent branch mispredictions – degrading performance, increasing cost, and reducing energy efficiency in data centers. Even the state-of-the-art branch predictor, TAGE-SC-L, suffers from an average branch Mispredictions Per Kilo Instructions (branch-MPKI) of 3.0 (0.5-7.2) for these applications since their large code footprints exhaust TAGE-SC-L’s intended capacity. In this work, we propose Whisper, a novel profile-guided mechanism to avoid branch mispredictions. Whisper investigates the in-production profile of data center applications to identify precise program contexts that lead to branch mispredictions. Corresponding prediction hints are then inserted into code to strategically avoid those mispredictions during program execution. Whisper presents three novel profile-guided techniques: (1) hashed history correlation which efficiently encodes hard-topredict correlations in branch history using lightweight Boolean formulas, (2) randomized formula testing which selects a locally optimal Boolean formula from a randomly selected subset of possible formulas to predict a branch, and (3) the extension of Read-Once Monotone Boolean Formulas with Implication and Converse Non-Implication to improve the branch history coverage of these formulas with minimal overhead. We evaluate Whisper on 12 widely-used data center applications and demonstrate that Whisper enables traditional branch predictors to achieve a speedup close to that of an ideal branch predictor. Specifically, Whisper achieves an average speedup of 2.8% (0.4%-4.6%) by reducing 16.8% (1.7%-32.4%) of branch mispredictions over TAGE-SC-L and outperforms the state-ofthe-art profile-guided branch prediction mechanisms by 7.9% on average.
more » « less
Full Text Available
APT-GET: profile-guided timely software prefetching

https://doi.org/10.1145/3492321.3519583

Jamilan, Saba; Khan, Tanvir Ahmed; Ayers, Grant; Kasikci, Baris; Litz, Heiner (March 2022, Eurosys)

Full Text Available
Thermometer: profile-guided btb replacement for data center applications

https://doi.org/10.1145/3470496.3527430

Song, Shixin; Khan, Tanvir Ahmed; Shahri, Sara Mahdizadeh; Sriraman, Akshitha; Soundararajan, Niranjan K; Subramoney, Sreenivas; Jiménez, Daniel A.; Litz, Heiner; Kasikci, Baris (June 2022, International Symposium on Computer Architecture (ISCA))

Full Text Available
PDede: Partitioned, Deduplicated, Delta Branch Target Buffer

https://doi.org/10.1145/3466752.3480046

Soundararajan, Niranjan K; Braun, Peter; Khan, Tanvir Ahmed; Kasikci, Baris; Litz, Heiner; Subramoney, Sreenivas (October 2021, 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21))
null (Ed.)
Full Text Available
DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling

Khan, Tanvir Ahmed; Neal, Ian; Pokam, Gilles; Mozafari, Barzan; Kasikci; Baris (July 2021, Proceedings of the Symposium on Operating Systems Principles)

Full Text Available
AlloyFL: a fault localization framework for Alloy

https://doi.org/10.1145/3468264.3473116

Khan, Tanvir Ahmed; Sullivan, Allison; Wang, Kaiyuan (January 2021, The Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering)

Full Text Available

« Prev Next »

Search for: All records